Energetic Natural Gradient Descent

ثبت نشده
چکیده

In this appendix we show that 1 2 ∆ F (θ)∆ is a second order Taylor approximation of D KL (p(θ)p(θ + ∆)). First, let g q (θ) :=D KL (qp(θ)) = ω∈Ω q(ω) ln q(ω) p(ω|θ). We begin by deriving equations for the Jacobian and Hessian of g q at θ: ∂g q (θ) ∂θ = ω∈Ω q(ω) p(ω|θ) q(ω) ∂ ∂θ q(ω) p(ω|θ) = ω∈Ω q(ω) p(ω|θ) q(ω) −q(ω) ∂p(ω|θ) ∂θ p(ω|θ) 2 = ω∈Ω − q(ω) p(ω|θ) ∂p(ω|θ) ∂θ , (4) and so: ∂ 2 g q (θ) ∂θ 2 = ∂ ∂θ ∂g q (θ) ∂θ = − ω∈Ω q(ω) ∂ ∂θ 1 p(ω|θ) ∂p(ω|θ) ∂θ = − ω∈Ω q(ω) p(ω|θ) ∂ 2 p(ω|θ) ∂θ 2 + ω∈Ω q(ω) p(ω|θ) 2 ∂p(ω|θ) ∂θ ∂p(ω|θ) ∂θ = − ω∈Ω q(ω) p(ω|θ) ∂ 2 p(ω|θ) ∂θ 2 + ω∈Ω q(ω) ∂ ln p(ω|θ) ∂θ ∂ ln p(ω|θ) ∂θ. (5) Next we compute a second order Taylor expansion of g q (θ + ∆) around g q (θ): g p(θ) (θ + ∆) Taylor 2 ≈ g p(θ) (θ) + ∆ ∂g p(θ) (θ) ∂θ (6) + 1 2 ∆ ∂ 2 g p(θ) (θ) ∂θ 2 ∆. Notice that g p(θ) (θ) = D KL (p(θ)p(θ)) = 0, and by (4) ∆ ∂g p(θ) (θ) ∂θ = − ∆ ω∈Ω p(ω|θ) p(ω|θ) ∂p(ω|θ) ∂θ = − ∆ ∂ ∂θ ω∈Ω p(ω|θ) (a) =0, where (a) holds because ω∈Ω p(ω|θ) = 1, so ∂ ∂θ ω∈Ω p(ω|θ) = ∂1 ∂θ = 0. (7) Thus, the first two terms on the right side of (6) are zero, and thus: g p(θ) (θ + ∆) Taylor 2 ≈ 1 2 ∆ ∂ 2 g p(θ) (θ) ∂θ 2 ∆. (8) Next we focus on the Hessian, (5), with q = p(θ): ∂ 2 g p(θ) (θ) ∂θ 2 = − ω∈Ω p(ω|θ) p(ω|θ) ∂ 2 p(ω|θ) ∂θ 2 (a) =0 + ω∈Ω p(ω|θ) ∂ ln p(ω|θ) ∂θ ∂ ln p(ω|θ) ∂θ =F (θ), where (a) comes from taking the derivative of both sides of (7) with respect to θ. Substituting this into (8) we have that g p(θ) (θ + ∆) Taylor 2 ≈ 1 2 ∆ F (θ)∆. In this section we show that ∆ E(θ)∆ is a second order Taylor approximation of D E (p(θ), p(θ + ∆)) 2. First, let g q (θ) :=D E (q, p(θ)) =2 …

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Energetic Natural Gradient Descent

We propose a new class of algorithms for minimizing or maximizing functions of parametric probabilistic models. These new algorithms are natural gradient algorithms that leverage more information than prior methods by using a new metric tensor in place of the commonly used Fisher information matrix. This new metric tensor is derived by computing directions of steepest ascent where the distance ...

متن کامل

An eigenvalue study on the sufficient descent property of a‎ ‎modified Polak-Ribière-Polyak conjugate gradient method

‎Based on an eigenvalue analysis‎, ‎a new proof for the sufficient‎ ‎descent property of the modified Polak-Ribière-Polyak conjugate‎ ‎gradient method proposed by Yu et al‎. ‎is presented‎.

متن کامل

Extensions of the Hestenes-Stiefel and Polak-Ribiere-Polyak conjugate gradient methods with sufficient descent property

Using search directions of a recent class of three--term conjugate gradient methods, modified versions of the Hestenes-Stiefel and Polak-Ribiere-Polyak methods are proposed which satisfy the sufficient descent condition. The methods are shown to be globally convergent when the line search fulfills the (strong) Wolfe conditions. Numerical experiments are done on a set of CUTEr unconstrained opti...

متن کامل

A Note on the Descent Property Theorem for the Hybrid Conjugate Gradient Algorithm CCOMB Proposed by Andrei

In [1] (Hybrid Conjugate Gradient Algorithm for Unconstrained Optimization J. Optimization. Theory Appl. 141 (2009) 249 - 264), an efficient hybrid conjugate gradient algorithm, the CCOMB algorithm is proposed for solving unconstrained optimization problems. However, the proof of Theorem 2.1 in [1] is incorrect due to an erroneous inequality which used to indicate the descent property for the s...

متن کامل

Natural Gradient Descent for Training Stochastic Complex-Valued Neural Networks

In this paper, the natural gradient descent method for the multilayer stochastic complex-valued neural networks is considered, and the natural gradient is given for a single stochastic complex-valued neuron as an example. Since the space of the learnable parameters of stochastic complex-valued neural networks is not the Euclidean space but a curved manifold, the complex-valued natural gradient ...

متن کامل

Scaling up Natural Gradient by Sparsely Factorizing the Inverse Fisher Matrix

Second-order optimization methods, such as natural gradient, are difficult to apply to highdimensional problems, because they require approximately solving large linear systems. We present FActorized Natural Gradient (FANG), an approximation to natural gradient descent where the Fisher matrix is approximated with a Gaussian graphical model whose precision matrix can be computed efficiently. We ...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2016